AITopics | topic label

Collaborating Authors

topic label

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Efficient Topic Extraction via Graph-Based Labeling: A Lightweight Alternative to Deep Models

Mekaoui, Salma, Sofyan, Hiba, Amaaz, Imane, Benchrif, Imane, Zarghili, Arsalane, Chaker, Ilham, Nikolov, Nikola S.

arXiv.org Artificial IntelligenceNov-7-2025

Extracting topics from text has become an essential task, especially with the rapid growth of unstructured textual data. Most existing works rely on highly computational methods to address this challenge. In this paper, we argue that probabilistic and statistical approaches, such as topic modeling (TM), can offer effective alternatives that require fewer computational resources. TM is a statistical method that automatically discovers topics in large collections of unlabeled text; however, it produces topics as distributions of representative words, which often lack clear interpretability. Our objective is to perform topic labeling by assigning meaningful labels to these sets of words. To achieve this without relying on computationally expensive models, we propose a graph-based approach that not only enriches topic words with semantically related terms but also explores the relationships among them. By analyzing these connections within the graph, we derive suitable labels that accurately capture each topic's meaning. We present a comparative study between our proposed method and several benchmarks, including ChatGPT-3.5, across two different datasets. Our method achieved consistently better results than traditional benchmarks in terms of BERTScore and cosine similarity and produced results comparable to ChatGPT-3.5, while remaining computationally efficient. Finally, we discuss future directions for topic labeling and highlight potential research avenues for enhancing interpretability and automation.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2511.04248

Country: North America > United States (1.00)

Genre: Research Report > New Finding (0.93)

Industry: Government > Regional Government > North America Government > United States Government (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Topic-Guided Reinforcement Learning with LLMs for Enhancing Multi-Document Summarization

Li, Chuyuan, Xu, Austin, Joty, Shafiq, Carenini, Giuseppe

arXiv.org Artificial IntelligenceSep-15-2025

A key challenge in Multi-Document Summarization (MDS) is effectively integrating information from multiple sources while maintaining coherence and topical relevance. While Large Language Models have shown impressive results in single-document summarization, their performance on MDS still leaves room for improvement. In this paper, we propose a topic-guided reinforcement learning approach to improve content selection in MDS. We first show that explicitly prompting models with topic labels enhances the informativeness of the generated summaries. Building on this insight, we propose a novel topic reward within the Group Relative Policy Optimization (GRPO) framework to measure topic alignment between the generated summary and source documents. Experimental results on the Multi-News and Multi-XScience datasets demonstrate that our method consistently outperforms strong baselines, highlighting the effectiveness of leveraging topical cues in MDS.

computational linguistic, large language model, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2509.09852

Country:

North America > United States (1.00)
Europe (1.00)
Asia (1.00)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Tell, Don't Show: Leveraging Language Models' Abstractive Retellings to Model Literary Themes

Lucy, Li, Griffiths, Camilla, Levine, Sarah, Eberhardt, Jennifer L., Demszky, Dorottya, Bamman, David

arXiv.org Artificial IntelligenceMay-30-2025

Conventional bag-of-words approaches for topic modeling, like latent Dirichlet allocation (LDA), struggle with literary text. Literature challenges lexical methods because narrative language focuses on immersive sensory details instead of abstractive description or exposition: writers are advised to "show, don't tell." We propose Retell, a simple, accessible topic modeling approach for literature. Here, we prompt resource-efficient, generative language models (LMs) to tell what passages show, thereby translating narratives' surface forms into higher-level concepts and themes. By running LDA on LMs' retellings of passages, we can obtain more precise and informative topics than by running LDA alone or by directly asking LMs to list topics. To investigate the potential of our method for cultural analytics, we compare our method's outputs to expert-guided annotations in a case study on racial/cultural identity in high school English language arts books.

computational linguistic, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2505.23166

Country: North America > United States (1.00)

Genre: Research Report > New Finding (0.67)

Industry:

Law > Civil Rights & Constitutional Law (0.94)
Education > Educational Setting > K-12 Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

ToReMi: Topic-Aware Data Reweighting for Dynamic Pre-Training Data Selection

Zhu, Xiaoxuan, Gu, Zhouhong, Wu, Baiqian, Zheng, Suhang, Wang, Tao, Li, Tianyu, Feng, Hongwei, Xiao, Yanghua

arXiv.org Artificial IntelligenceApr-22-2025

Pre-training large language models (LLMs) necessitates enormous diverse textual corpora, making effective data selection a key challenge for balancing computational resources and model performance. Current methodologies primarily emphasize data quality metrics and mixing proportions, yet they fail to adequately capture the underlying semantic connections between training samples and quality disparities within individual domains. We introduce ToReMi (Topic-based Reweighting for Model improvement), a novel two-stage framework that dynamically adjusts training sample weights according to their topical associations and observed learning patterns. Our comprehensive experiments reveal that ToReMi variants consistently achieve superior performance over conventional pre-training approaches, demonstrating accelerated perplexity reduction across multiple domains and enhanced capabilities on downstream evaluation tasks. Code is available at https://github.com/zxx000728/ToReMi.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2504.00695

Country: Asia > Middle East (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

LLM-Based Insight Extraction for Contact Center Analytics and Cost-Efficient Deployment

Embar, Varsha, Shrivastava, Ritvik, Damodaran, Vinay, Mehlinger, Travis, Hsiao, Yu-Chung, Raghunathan, Karthik

arXiv.org Artificial IntelligenceMar-24-2025

Large Language Models have transformed the Contact Center industry, manifesting in enhanced self-service tools, streamlined administrative processes, and augmented agent productivity. This paper delineates our system that automates call driver generation, which serves as the foundation for tasks such as topic modeling, incoming call classification, trend detection, and FAQ generation, delivering actionable insights for contact center agents and administrators to consume. We present a cost-efficient LLM system design, with 1) a comprehensive evaluation of proprietary, open-weight, and fine-tuned models and 2) cost-efficient strategies, and 3) the corresponding cost analysis when deployed in production environments.

call driver, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2503.1909

Country: Asia > Thailand > Bangkok > Bangkok (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.47)

Add feedback

Neural Topic Modeling with Large Language Models in the Loop

Yang, Xiaohao, Zhao, He, Xu, Weijie, Qi, Yuanyuan, Lu, Jueqing, Phung, Dinh, Du, Lan

arXiv.org Artificial IntelligenceDec-16-2024

Topic modeling is a fundamental task in natural language processing, allowing the discovery of latent thematic structures in text corpora. While Large Language Models (LLMs) have demonstrated promising capabilities in topic discovery, their direct application to topic modeling suffers from issues such as incomplete topic coverage, misalignment of topics, and inefficiency. To address these limitations, we propose LLM-ITL, a novel LLM-in-the-loop framework that integrates LLMs with Neural Topic Models (NTMs). In LLM-ITL, global topics and document representations are learned through the NTM. Meanwhile, an LLM refines these topics using an Optimal Transport (OT)-based alignment objective, where the refinement is dynamically adjusted based on the LLM's confidence in suggesting topical words for each set of input words. With the flexibility of being integrated into many existing NTMs, the proposed approach enhances the interpretability of topics while preserving the efficiency of NTMs in learning topics and document representations. Extensive experiments demonstrate that LLM-ITL helps NTMs significantly improve their topic interpretability while maintaining the quality of document representation. Our code and datasets will be available at Github.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2411.08534

Country:

Oceania > Australia (0.04)
North America > United States (0.04)
North America > Canada > Ontario > Toronto (0.04)
(3 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Generative AI for automatic topic labelling

Kozlowski, Diego, Pradier, Carolina, Benz, Pierre

arXiv.org Artificial IntelligenceAug-13-2024

Topic Modeling has become a prominent tool for the study of scientific fields, as they allow for a large scale interpretation of research trends. Nevertheless, the output of these models is structured as a list of keywords which requires a manual interpretation for the labelling. This paper proposes to assess the reliability of three LLMs, namely flan, GPT-4o, and GPT-4 mini for topic labelling. Drawing on previous research leveraging BERTopic, we generate topics from a dataset of all the scientific articles (n=34,797) authored by all biology professors in Switzerland (n=465) between 2008 and 2020, as recorded in the Web of Science database. We assess the output of the three models both quantitatively and qualitatively and find that, first, both GPT models are capable of accurately and precisely label topics from the models' output keywords. Second, 3-word labels are preferable to grasp the complexity of research topics.

iteration, representation, similarity, (17 more...)

arXiv.org Artificial Intelligence

2408.07003

Country:

Europe > Switzerland (0.25)
North America > Canada > Quebec > Montreal (0.05)
Asia > South Korea > Daegu > Daegu (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.51)

Add feedback

TopicTag: Automatic Annotation of NMF Topic Models Using Chain of Thought and Prompt Tuning with LLMs

Wanna, Selma, Barron, Ryan, Solovyev, Nick, Eren, Maksim E., Bhattarai, Manish, Rasmussen, Kim, Alexandrov, Boian S.

arXiv.org Artificial IntelligenceJul-28-2024

Topic modeling is a technique for organizing and extracting themes from large collections of unstructured text. Non-negative matrix factorization (NMF) is a common unsupervised approach that decomposes a term frequency-inverse document frequency (TF-IDF) matrix to uncover latent topics and segment the dataset accordingly. While useful for highlighting patterns and clustering documents, NMF does not provide explicit topic labels, necessitating subject matter experts (SMEs) to assign labels manually. We present a methodology for automating topic labeling in documents clustered via NMF with automatic model determination (NMFk). By leveraging the output of NMFk and employing prompt engineering, we utilize large language models (LLMs) to generate accurate topic labels. Our case study on over 34,000 scientific abstracts on Knowledge Graphs demonstrates the effectiveness of our method in enhancing knowledge management and document organization.

automatic annotation, thought and prompt tuning, topic label, (12 more...)

arXiv.org Artificial Intelligence

2407.19616

Country:

North America > United States > California > Santa Clara County > San Jose (0.15)
North America > United States > New York > New York County > New York City (0.05)
North America > United States > New Mexico > Los Alamos County > Los Alamos (0.05)
(7 more...)

Genre: Research Report > New Finding (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.98)

Add feedback

Interactive Topic Models with Optimal Transport

Dhanania, Garima, Mysore, Sheshera, Pham, Chau Minh, Iyyer, Mohit, Zamani, Hamed, McCallum, Andrew

arXiv.org Artificial IntelligenceJun-28-2024

Topic models are widely used to analyze document collections. While they are valuable for discovering latent topics in a corpus when analysts are unfamiliar with the corpus, analysts also commonly start with an understanding of the content present in a corpus. This may be through categories obtained from an initial pass over the corpus or a desire to analyze the corpus through a predefined set of categories derived from a high level theoretical framework (e.g. political ideology). In these scenarios analysts desire a topic modeling approach which incorporates their understanding of the corpus while supporting various forms of interaction with the model. In this work, we present EdTM, as an approach for label name supervised topic modeling. EdTM models topic modeling as an assignment problem while leveraging LM/LLM based document-topic affinities and using optimal transport for making globally coherent topic-assignments. In experiments, we show the efficacy of our framework compared to few-shot LLM classifiers, and topic models based on clustering and LDA. Further, we show EdTM's ability to incorporate various forms of analyst feedback and while remaining robust to noisy analyst inputs.

assignment, topic assignment, topic modeling, (14 more...)

arXiv.org Artificial Intelligence

2406.19928

Country:

North America > United States > Massachusetts > Hampshire County > Amherst (0.14)
North America > United States > New York > New York County > New York City (0.05)
Asia > Middle East > Jordan (0.05)
(12 more...)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.91)

Add feedback

Exploring a Hybrid Deep Learning Framework to Automatically Discover Topic and Sentiment in COVID-19 Tweets

Shahriar, Khandaker Tayef, Sarker, Iqbal H.

arXiv.org Artificial IntelligenceDec-2-2023

COVID-19 has created a major public health problem worldwide and other problems such as economic crisis, unemployment, mental distress, etc. The pandemic is deadly in the world and involves many people not only with infection but also with problems, stress, wonder, fear, resentment, and hatred. Twitter is a highly influential social media platform and a significant source of health-related information, news, opinion and public sentiment where information is shared by both citizens and government sources. Therefore an effective analysis of COVID-19 tweets is essential for policymakers to make wise decisions. However, it is challenging to identify interesting and useful content from major streams of text to understand people's feelings about the important topics of the COVID-19 tweets. In this paper, we propose a new \textit{framework} for analyzing topic-based sentiments by extracting key topics with significant labels and classifying positive, negative, or neutral tweets on each topic to quickly find common topics of public opinion and COVID-19-related attitudes. While building our model, we take into account hybridization of BiLSTM and GRU structures for sentiment analysis to achieve our goal. The experimental results show that our topic identification method extracts better topic labels and the sentiment analysis approach using our proposed hybrid deep learning model achieves the highest accuracy compared to traditional models.

sentiment analysis, springer nature 2021, tweet, (12 more...)

arXiv.org Artificial Intelligence

2312.01178

Country:

Asia > Singapore (0.04)
Oceania > Australia > Western Australia > Perth (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
(5 more...)

Genre: Research Report > New Finding (0.88)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Epidemiology (1.00)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback